Guide for setting up Azure
This guide helps you set up your Microsoft Azure environment to run with DataStori.
DataStori uses Azure Container Instances (ACI) to run the pipeline code. DataStori integrates with your Azure subscription using a Service Principal with specific permissions.
Requirements
Please be ready with the following resources:
Networking requirements
- Virtual Network (VNet) Name: The VNet where you want to run your code.
- Subnet Name: The specific subnet within the VNet for the container instances.
- Network Security Group (NSG) Name: The security group you want to apply to the container instances running the pipeline code.
Service Requirements
- Azure Container Instances (ACI): DataStori will spin up container instances to run the pipeline and store the output in Azure Blob Storage. You just need a Resource Group where these resources will be created.
- Azure Blob Storage: Data is stored here. Please be ready with the Storage Account Name and the Container Name where you want the data to be stored.
- RDBMS (optional): If you want to push data to any other RDBMS.
Identity / Service Principal
A Service Principal is an application identity within your Azure Active Directory tenant. We will create one for DataStori and assign it a custom role with the minimum required permissions.
- Create a Service Principal:
- Navigate to Azure Active Directory -> App registrations -> New registration.
- Give it a name, like
datastori-integration-sp. - Choose "Accounts in this organizational directory only".
- Click Register.
- From the overview page, note down the Application (client) ID and Directory (tenant) ID.
- Go to Certificates & secrets, click New client secret, give it a description, and copy the Value immediately. You won't be able to see it again.
Create a Custom Role Definition: Create a custom role that grants DataStori permissions to manage ACI and access storage. You can create this using Azure CLI or by uploading a JSON file in the IAM section of your subscription.
Save the following JSON as
DataStori-Role-Definition.json. Replace<YOUR_SUBSCRIPTION_ID>with your actual subscription ID.{
"Name": "DataStori ACI Runner",
"IsCustom": true,
"Description": "Allows DataStori to run container instances and access storage.",
"Actions": [
"Microsoft.ContainerInstance/containerGroups/write",
"Microsoft.ContainerInstance/containerGroups/read",
"Microsoft.ContainerInstance/containerGroups/delete",
"Microsoft.ContainerInstance/containerGroups/start/action",
"Microsoft.Resources/subscriptions/resourceGroups/read",
"Microsoft.Storage/storageAccounts/listKeys/action"
],
"NotActions": [],
"AssignableScopes": [
"/subscriptions/<YOUR_SUBSCRIPTION_ID>"
]
}To create the role, run this Azure CLI command:
az role definition create --role-definition @DataStori-Role-Definition.jsonAssign the Custom Role:
- Navigate to the Resource Group where your VNet and Storage Account are located.
- Go to Access control (IAM) -> Add -> Add role assignment.
- Select the "DataStori ACI Runner" role you just created.
- In the Select box, search for the
datastori-integration-spService Principal you created. - Click Save.
Logging (Optional)
By default, DataStori will write pipeline logs to Azure Monitor Logs. If you want to customize the logging destination, please share the details of your Log Analytics Workspace.
Summary
To proceed with the Azure setup, please provide the following:
- Directory (Tenant) ID
- Application (Client) ID of the Service Principal
- Client Secret Value for the Service Principal
- Subscription ID
- Resource Group Name
- VNet Name
- Subnet Name
- Network Security Group Name
- Storage Account Name
- Storage Container Name
- Storage Account Region
- Log Analytics Workspace details (optional)